Class Imbalance and Active Learning
نویسندگان
چکیده
The rich history of predictive modeling has culminated in a diverse set of techniques capable of making accurate predictions on many real-world problems. Many of these techniques demand training, whereby a set of instances with ground-truth labels (values of a dependent variable) are observed by a model-building process that attempts to capture, at least in part, the relationship between the features of the instances and their labels. The resultant model can be applied to instances for which the label is not known, to estimate or predict the labels. These predictions depend not only on the functional structure of the model itself, but on the particular data with which the model was trained. The accuracy of the predicted labels depends highly on the model’s ability to capture an unbiased and sufficient understanding of the characteristics of different classes; in problems where the prevalence of classes is imbalanced, it is necessary to prevent the resultant model from being skewed towards the majority class and to ensure that the model is capable of reflecting the true nature of the minority class. Another consequence of class imbalance is observed in domains where the ground truth labels in the dataset are not available beforehand and need to be gathered ondemand at some cost. The costs associated with collecting labels may be due to human labor or as a result of costly incentives, interventions or experiments. In these settings, simply labeling all available instances may not be practicable, due to budgetary constraints or simply a strong desire to be cost efficient. In a similar fashion to predictive modeling with imbalanced classes, the goal here is to ensure that the budget is not predominantly expended on getting the labels of the majority class instances, and to make sure that the set of instances to be labeled have comparable number of minority class instances as well. In the context of learning from imbalanced datasets, the role of active learning can be viewed from two different perspectives. The first perspective considers the case where the labels for all the examples in a reasonably large, imbalanced dataset are readily available. The role of active learning in this case is to reduce, and potentially eliminate, any adverse effects that the class imbalance can have on the model’s generalization performance. The other perspective addresses the setting where we have prior knowledge that the dataset is imbalanced, and we would like to employ active
منابع مشابه
The Role of Class Scale in Promotion of Students’ Participation in Active Learning Process (Case Study: Male Students of a Secondary School in Shiraz)
Perception and experience gained in the contemporary school could not help human beings' active learning. Totally, participation is the main element in active learning and thus, the active participation of students in the learning process is emphasized by education and learning in secondary schools. Given the importance of active learning, in this paper, the effective components in this type of...
متن کاملMMDT: Multi-Objective Memetic Rule Learning from Decision Tree
In this article, a Multi-Objective Memetic Algorithm (MA) for rule learning is proposed. Prediction accuracy and interpretation are two measures that conflict with each other. In this approach, we consider accuracy and interpretation of rules sets. Additionally, individual classifiers face other problems such as huge sizes, high dimensionality and imbalance classes’ distribution data sets. This...
متن کاملAdaptive Resampling with Active Learning
This paper proposes a novel algorithm Virtual Instances Resampling Technique Using Active Learning (VIRTUAL) for class imbalance problem in Support Vector Machine (SVM) learning. In supervised learning, prediction performance of the classification algorithms deteriorate when the training set is imbalanced. Class imbalance problem occurs when at least one of the classes are represented by substa...
متن کاملEnsemble-based active learning for class imbalance problem
In medical diagnosis, the problem of class imbalance is popular. Though there are abundant unlabeled data, it is very difficult and expensive to get labeled ones. In this paper, an ensemble-based active learning algorithm is proposed to address the class imbalance problem. The artificial data are created according to the distribution of the training dataset to make the ensemble diverse, and the...
متن کاملActive Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem
In this paper, we analyze the effect of resampling techniques, including undersampling and over-sampling used in active learning for word sense disambiguation (WSD). Experimental results show that under-sampling causes negative effects on active learning, but over-sampling is a relatively good choice. To alleviate the withinclass imbalance problem of over-sampling, we propose a bootstrap-based ...
متن کاملBreast Cancer Diagnosis from Perspective of Class Imbalance
Introduction: Breast cancer is the second cause of mortality among women. Early detection is the only rescue to reduce the risk of breast cancer mortality. Traditional methods cannot effectively diagnose tumor since they are based on the assumption of well-balanced dataset.. However, a hybrid method can help to alleviate the two-class imbalance problem existing in the ...
متن کامل